

# Ultrasound System Design: Part 3: Imaging Systems

Enrico Boni

Microelectronics Systems Design Lab

enrico.boni@unifi.it



- TX/RX configuration and data handling
- Digital beamforming
- Back-end processing
- Active probes configuration









- 3 possible configurations for TX/RX
  - Full matrix
  - Full TX, muxed RX
  - TX/RX mux



- Full matrix
  - All elements connected in TX
  - All elements connected in RX



#### Full matrix

- All elements connected in TX
- All elements connected in RX

#### Pros:

- Best S/N
- Best flexibility in TX/RX strategy implementation

#### Cons:

- High hardware complexity
- High cost
- High bandwidth requirements



- Full TX, muxed RX
  - All elements connected in TX
  - Subset of elements connected in RX



- Full TX, muxed RX
  - All elements connected in TX
  - Subset of elements connected in RX
- Pros:
  - Low RX complexity
  - Low RX bandwidth and cost
  - Low-voltage multiplexers
- Cons:
  - Less RX S/N
  - Less RX strategy flexibility



#### TX/RX mux

- Subset of elements connected in TX
- Subset of elements connected in RX



#### TX/RX mux

- Subset of elements connected in TX
- Subset of elements connected in RX
- Pros:
  - Lowest complexity
  - Lowest cost and bandwidth
- Cons:
  - Lowest S/N
  - Lowest TX/RX strategy flexibility
  - HV multiplexers



---- Config

- · - RX Data



Remember: when muxing on RX this number can be reduced!!



ADC interface usually implemented on multiple FPGA devices (32-64ch/FPGA)



Buffer Memory (DDR chips) is managed by the FPGA devices:

– DDR interface speed:

16-bit chip, 500MHz clock → 16 gbit/s nominal → 8~10 gbit/s real

16 parallel chips to handle the peak datarate for buffering 128ch

#### Example: ULA-OP 256 FE board



Example: ULA-OP 256 FE board

4 x 16-bit DDR chips → 40 gbit/s net



1 FPGA for buffering and beamforming

32 channels/board → 32 gbit/s

# ULA-OP 256 FE board



#### **Analog Front-End:**

- 32 TX/RX channels
- 32 AWGs on TX
- 32 ADCs 80MSPS, 12-bit

#### **FPGA (Arria V family):**

- 4 Parallel beamformers (extendable capability)
- 1 GS/s beamforming capability
- 2 GB DDR3 Memory

#### **DSPs:**

- 2 multicore DSPs 'C6678 (16 cores@1.2GHz)
- 8 GB DDR3 memory

ULA-OP 256 system

## Data buffering (for storage)





Receive & Buffering

RX Beamformer



# Beamforming:

Method to electronically direct the US energy to a specific depth and direction (TX) or to focus the US energy from a specific depth or direction (RX)



#### How is the scan line direction set?

The **line direction** is set by using N<sub>el</sub> elements (covering a total aperture, D=N<sub>el</sub>×pitch) simultaneously active



- The line direction corresponds to the axis of the selected group of Net elements
- The aperture is:  $D = N_{el} \times pitch$
- Different line directions are obtained by selecting different group of elements
- The distance between two consecutive lines is equal to the distance between the centers of the corresponding selected groups of elements
- The distance between lines is typically equal to the pitch

How can the scan line be steered to a direction not perpendicular to the probe surface?



**beam steering** can be obtained by properly delaying the individual excitation signals

- The elements are sequentially excited: first the extremal one, and then the others
- The total effect is the same as ≪inclining≫ the array

#### How can the energy be focused along the scan line?

focusing is obtained by exciting the individual elements with delayed signals (to simulate a lens)



- The larger distance of lateral elements from the focus can be compensated by delaying the excitation signals of most central elements
- If each element is suitably delayed, ultrasound contributions from all elements become synchronous (in phase) in the focus
- The total effect is equal to that of an acoustic lens with the same focal distance

Typically, the same pulse is applied, with different delays, to all active elements:



The TX beamforming signals are individually generated by independent Transmitters:

- DACs + HV linear drivers feeded by FPGA logic
- HV square wave drivers feeded directly by FPGA logic
- For each active element: waveform + delay



The signal received by each element is individually delayed. Then, all delayed signals are summed together (*delay and sum*)



In «single focus beamforming» a specific configuration of delays is used



In dynamic RX beamforming, the delays are dynamically updated during receive, before being summed (dynamic delay and sum)







- Each row of the memory corresponds to one of N active elements
- N counters, each initialized @ the first ≪useful≫ depth
- At each clk cycle, each counter is incremented, or not, depending on the desired delay configuration
- The data read out of the memory are summed together
- Note: the delays are quantized at the FADC rate unless an interpolating stage is added

Example: ULA-OP 256 system FE beamformer



- Standard DAS architecture, 32 channels
- Sub-sample interpolation (down to 1/16 of the sampling period)
- 235 MHz Clock frequency



# Digital RX multi-line Beamforming

#### ULA-OP 256 system FE beamformer



# **Echo-data Dual Port Memory:**

- 8192 samples/ch
- 32 channels

4 parallel beamformer Instances

Can iterate multiple times on the same

Dual-port data

Line 4 (Parallel/Sequential)





# BE processing: Image formation



- The envelope must be detected
- It is convenient doing this in baseband, at a lower rate (typically, through FPGA or DSP)
- Data compression is needed to reduce the echo dynamics (and, thus, highlight lower echoes





# BE processing: Image formation



RF

**Envelope Detected** 

Compressed







- Bursts are transmitted at *PRF* rate: for each TX burst, *one* sample of the Doppler signal is obtained (*time sampling*).
- The electronic gate selects the information backscattered only from the region of interest (*spatial sampling* → *sample volume*).

#### The PRF sets:

• The maximum depth,  $D_{max}$ , which may be investigated:

$$D_{max} = 1/PRF \times c/2$$

• The maximum velocity that may be detected:

$$V = \frac{c}{2\cos\vartheta f_0} \times \frac{PRF}{2}$$



Remember that: between subsequent PRIs you measure a phase shift on the echo due to the motion of the target.

You are not measuring the real Doppler effect (which requires a CW excitation)



Processing Pipeline is similar to Imaging, with quadrature demodulation, but then the I/Q complex samples are taken at a certain depth (gate)



Spectral analysis of the Doppler signal allows detecting all velocity contributions within the sample volume



In Doppler spectrograms, subsequent spectra are grey-scale (or color) coded and displayed in adjacent vertical lines





- Whenever an active probe is used, a configuration channel is needed.
- This is typically done with digital interfaces (i.e. UART, SPI, I2C) which carry digital switching noise during transactions
- Probe configuration thus cannot be done during the analog RX phase

- Different possible configuration data:
  - Transmit configuration (pulse shape and delay)
  - TX/RX multiplexing
  - Micro-beamforming delays
  - More structured data (for more advanced active probes)



- 2 possible solutions to avoid switching noise:
  - Configuration sent between RX phases (Pulse by Pulse)
  - Configuration fully sent before operations



- Pulse by Pulse configuration:
  - The time available is between the end of RX and start of new TX
  - In critical configuration this could reduce to few us
  - Depending on the digital link, this reduces greatly the amount of data that can be transferred
  - Or the PRI must be increased, reducing the frame rate of the siystem

### Example:

- 1024 els active probe, with 4:1 u-beamformer (all the same) and programmable TX delay (8 bit) on every element
- 600 mbit configuration link

1024 x 8 = 8kbit configuration (the RX single beamforming information is negligible)

8192/600 = 13us of available inter-RX time

- Configuration fully sent before operations:
  - This requires the full amount of configuration data, for all the different TX/RX events, to be stored in-probe
  - The probe should have logic to sequence the data



- Same example as before:
  - 1024 els active probe, with 4:1 u-beamformer (all the same) and programmable TX delay (8 bit) on every element
  - Store a 1024-TX/RX sequence for volumetric imaging

1024 x 8 = 8kbit configuration for every TX/RX 8kbit x 1024 = 8 Mbit of memory needed in-probe, but inprobe ASICS typically employ static memory without access to SDRAM chips, thus this could be problematic

- Possible solution is increasing the probe «smartness», i.e. a more advanced probe can use less, higher level, configuration data, to produce all the low-level configuration data needed.
- The future way will be to have in-probe digitization and data reduction, with all-digital data link to/from the system